Because I keep saying I will remember how do do things and then I don’t so I’m putting the bookmarked links and comments into one place to try and help me spend less time searching for answers.
cor()## mpg cyl disp hp
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl -0.8521620 1.0000000 0.9020329 0.8324475
## disp -0.8475514 0.9020329 1.0000000 0.7909486
## hp -0.7761684 0.8324475 0.7909486 1.0000000
If data has NA’s in any of the values the cor() will results in NA. If you want to remove the NA’s when calculating correlation do:
cor(..., use = "complete.obs")
Source: https://stackoverflow.com/questions/3798998/cor-shows-only-na-or-1-for-correlations-why
read.csv()## ï..Name.1 Name..2. Name..3
## 1 ch1 1 10
## 2 ch2 2 12
## 3 ch3 3 13
## 4 ch4 NA 14
## 5 ch5 5 15
## 6 6 16
## 7 ch7 7 17
read.csv("example.csv", fileEncoding = 'UTF-8-BOM'); Source)header = FALSE; column names will be V1, V2, V3, etc.readr read_csv()## Parsed with column specification:
## cols(
## `Name 1` = col_character(),
## `Name (2)` = col_double(),
## `Name #3` = col_double()
## )
## # A tibble: 7 x 3
## `Name 1` `Name (2)` `Name #3`
## <chr> <dbl> <dbl>
## 1 ch1 1 10
## 2 ch2 2 12
## 3 ch3 3 13
## 4 ch4 NA 14
## 5 ch5 5 15
## 6 <NA> 6 16
## 7 ch7 7 17
col_names = FALSE; column names will be X1, X2, X3, etc.R does a pretty good job of figuring out what the columns should be but if its needed to specify column types (or you want don’t want the default col_types message to show) column types can be specified:
read_csv("example.csv"
, col_types = cols(
`Name 1` = col_character()
, `Name (2)` = col_double()
, `Name #3` = col_double()
)
)## # A tibble: 7 x 3
## `Name 1` `Name (2)` `Name #3`
## <chr> <dbl> <dbl>
## 1 ch1 1 10
## 2 ch2 2 12
## 3 ch3 3 13
## 4 ch4 NA 14
## 5 ch5 5 15
## 6 <NA> 6 16
## 7 ch7 7 17
Can use write_csv() or write.csv() - have sightly different functionality.
test <- read_csv("example.csv"
, col_types = cols(
`Name 1` = col_character()
, `Name (2)` = col_double()
, `Name #3` = col_double()
)
)
write_csv(test, "example_export1.csv")
write.csv(test, "example_export2.csv")Rownames; row.names = TRUE to include; row.names = FALSE to exclude
write.csv() default includes row names (usually row number)write_csv() default does not include row names; CANNOT ADD
NA values; na = "" to have missing data be exported as blank cell
write.csv() default is na = "NA" for numeric, always blank for character (can’t change!)write_csv() default is na = "NA" for numeric, always blank for character (can’t change!)write.csv() default is na = "NA" for numeric and characterwrite_csv() default is na = "NA" for numeric and character| String | Meaning | Code | Output |
|---|---|---|---|
| %a | Day of the week, abbreviated (Mon-Sun) | format.Date(“2020-12-10”, “%a”) | Thu |
| %A | Day of the week, full (Monday-Sunday | format.Date(“2020-12-10”, “%A”) | Thursday |
| %w | Day of the week, numeric, 0 = Sunday (0-6) | format.Date(“2020-12-10”, “%w”) | 4 |
| %e | Day of month (1-31) | format.Date(“2020-12-10”, “%e”) | 10 |
| %d | Day of month (01-31) | format.Date(“2020-12-10”, “%d”) | 10 |
| %m | Month, numeric (01-12) | format.Date(“2020-12-10”, “%m”) | 12 |
| %b | Month, abbreviated (Jan-Dec) | format.Date(“2020-12-10”, “%b”) | Dec |
| %B | Month, full (January-December) | format.Date(“2020-12-10”, “%B”) | December |
| %y | Year, without century (00-99) | format.Date(“2020-12-10”, “%y”) | 20 |
| %Y | Year, with century (0000-9999) | format.Date(“2020-12-10”, “%Y”) | 2020 |
| %j | Day of the Year (001-366) | format.Date(“2020-12-10”, “%j”) | 345 |
| %U | Week of year, numeric, starting on Sunday (00-52) | format.Date(“2020-12-10”, “%U”) | 49 |
| %W | Week of year, numeric, starting on Monday (00-52) | format.Date(“2020-12-10”, “%W”) | 49 |
| %x | Locale-specific date | format.Date(“2020-12-10”, “%x”) | 12/10/2020 |
| String | Meaning | Code | Output |
|---|---|---|---|
| %S | Second (00-59) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%S”) | 10 |
| %M | Minute (00-59) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%M”) | 30 |
| %l | Hour, in 12-hour clock (1-12) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%l”) | 3 |
| %I | Hour, in 12-hour clock (01-12) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%I”) | 03 |
| %p | am/pm | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%p”) | PM |
| %H | Hour, in 24-hour clock (00-23) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%H”) | 15 |
| %X | Locale-specific time | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%X”) | 3:30:10 PM |
| %c | Locale-specific date and time | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%c”) | Thu Dec 10 15:30:10 2020 |
| %z | Offset from GMT | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%z”) | -0600 |
| %Z | Time zone (character) | format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%Z”) | CST |
The above example uses Central time and so I can use tz = America/Chicago; other timezone options can be found using the code below:
## [1] "America/Los_Angeles"
## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
## [7] "Africa/Bamako" "Africa/Bangui" "Africa/Banjul"
## [10] "Africa/Bissau" "Africa/Blantyre" "Africa/Brazzaville"
## [13] "Africa/Bujumbura" "Africa/Cairo" "Africa/Casablanca"
## [16] "Africa/Ceuta" "Africa/Conakry" "Africa/Dakar"
## [19] "Africa/Dar_es_Salaam" "Africa/Djibouti"
Why do I always forget the direction of these?
hjust: 0 = left-aligned, 0.5=center, 1 = right-aligned
vjust: 0 = top-aligned, 0.5=middle, 1 = bottom-aligned
quote()ggplot(mpg, aes(displ, hwy))+geom_point()+
ggtitle(
quote(
alpha ^ 2 - frac(1, 10) + sum(n[i], i==1, N)
)
)TeX() from the latex2exp package## Warning: package 'latex2exp' was built under R version 4.0.3
ggplot(mpg, aes(displ, hwy))+geom_point()+
ggtitle(TeX(
"$\\alpha^2 - \\frac{1}{10} + \\sum_{i}^N n_i$"
)
)Sometimes I’m working on two different types of plots (like a bar chart and a scatter plot) that happen to have the same x-axis. I want to line up these axes so that when the plots are stacked the values correspond to the same date.
gridExtra::grid.arrange() and cowplot::plot_grid()# two different bar charts
A <- ggplot(mpg, aes(class))+geom_bar()+coord_flip()+ylim(0, 109)
B <- ggplot(mpg, aes(drv))+geom_bar()+coord_flip()+ylim(0, 109)Using grid.arrange command from the gridExtra package does not line up axes.
Use grid.draw command from the grid package.
Source
#make plots into Grobs (grid graphical object)
gA <- ggplotGrob(A)
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))The cowplot::plot_grid() function allows you to line up plots by a specific axis.
Another option is facet_wrap() or facet_grid(), which can works if the axes are the same for the different variables you want to compare, but be careful as facets are supposed to be comparing items with the same measurements.
tidy.df <- pivot_longer(mpg, c(class, drv), names_to = "category", values_to = "type")
ggplot(tidy.df, aes(type))+
geom_bar()+
coord_flip()+
facet_wrap(
~category
, ncol = 1
, scales = "free" #removes types from the axis if that category has 0 cars of that type
)ggplot(tidy.df, aes(type))+
geom_bar()+
coord_flip()+
facet_grid(
category ~ .
, scales = "free" #removes types from the axis if that category has 0 cars of that type
, space = "free" #spaces based on number of obs (i.e. number of bars);
# rather than giving each facet equal sizing
)Scatter plots and bar charts will not line up automatically, even when using the grid.draw command detailed above. This is because their default limits are different given that the bar chart is centered on the value and the scatter plot is a single point on the value.
#work with smaller subset of data from economics, part of ggplot2 package
startdate <- "2014-06-01"
economics_small <- economics %>%
filter(date >= as.Date(startdate)) %>%
arrange(date)A <- ggplot(economics_small, aes(date, unemploy))+
geom_bar(stat="identity")+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)
B <- ggplot(economics_small, aes(date, uempmed))+
geom_point()+geom_line()+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)
gA <- ggplotGrob(A)
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB)) #cowplot::plot_grid(A, B, ncol = 1, align = "v") produces same result In order to line the up there a a couple of options.
If you make the limit the first x-value, the bar chart will not show up (remember it’s centered over the value).
A <- ggplot(economics_small, aes(date, unemploy))+
geom_bar(stat="identity")+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
xlim(as.Date(startdate), NA)
B <- ggplot(economics_small, aes(date, uempmed))+
geom_point()+geom_line()+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
xlim(as.Date(startdate), NA)
gA <- ggplotGrob(A) ## Warning: Removed 1 rows containing missing values (geom_bar).
This can be fixed by adding a half unit to the x-axis (i.e. having the lower limit be half-unit lower than smallest x-value). In this case the unit is a month, so a half-unit would be ~15 days.
## Time difference of 15 days
A <- ggplot(economics_small, aes(date, unemploy))+
geom_bar(stat="identity")+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
xlim(as.Date(startdate)-HalfUnit, NA)
B <- ggplot(economics_small, aes(date, uempmed))+
geom_point()+geom_line()+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
xlim(as.Date(startdate)-HalfUnit, NA)
gA <- ggplotGrob(A)
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))Bar charts are automatically centered over the x-value. Bar charts (and any geom object) can be shifted by using position - position_nudge()). The shift needs to be half a unit on the x-axis, again here it is monthly data so a half unit would be ~15 days.
Source
A <- ggplot(economics_small, aes(date, unemploy))+
geom_bar(stat="identity", position = position_nudge(x = as.vector(HalfUnit)))+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)
B <- ggplot(economics_small, aes(date, uempmed))+
geom_point()+geom_line()+
geom_vline(xintercept = as.Date(startdate), color="red", size=2)
gA <- ggplotGrob(A)
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))test expression goes in parenthesis () and the statment goes in the curly brakets {}
if (test) { statment }
R is a bit finicky with where the brakets go; I get errors when I put else on a new line by itself - it wants to have the right braket before it; } else
if (test) {
statment #1
} else {
statment #2
}
if (test) {
statment #1
} elseif {
statment #2
} elseif {
statment #3
} else {
statment #4
}
ifelse()Automatically works for vectors so this is preferred if making adjustments to data set variables.
ifelse(condition, value if true, value if false)
case_when() (instead of nested ifelse() statements)case_when(
x == val1 ~ output1
, x == val2 ~ output2
, x == val3 ~ output3
#if x doesn't fit into above values can set a catch-all output
#if catch-all output is not defined; output will be NA for x's that don't meet conditons
, TRUE ~ everythingelse
)

Re sizing images in Markdown is required if you are knitting to a pdf - because you can’t use HTML code.
Tip and tricks for workign wtih images and figures in R Markdown documents - hollie@zevross.com
Adjust the out.width and out.height in the R chunk options
{r, out.width="50%"}
img <- "img/Peegs.jpg" #path to image
knitr::include_graphics(img) #in the knitr package
In my opinion, HTML is a lot easier to use for images options.
<img src="img/Peegs.jpg" alt="this is Daffodil and Blossom" width="50%">
R Studio Support - Installing older versions of packages
packageurl <- "https://cran.r-project.org/src/contrib/Archive/<package>/<package>_<version>.tar.gz"
install.packages(packageurl, repos=NULL, type="source")
If a program was built entirely in an older version of R, it may be difficult to get it to work with an updated version of R. When their isn’t time to investitgate and re-code, installing an older version of R is possible.